This document details the analyses and provides results for the primary analyses of the VIBRANT trial.
Trial data
The data used for these analyses are stored in an R Bioconductor MultiAssayExperiment object which contains the QCed and pre-processed data. The code and description of how raw data have been processed and QCed is available on the VIBRANT-0-QC-and-preprocessing GitHub repository (Still private at the moment, but will be made public upon publication, and anyone wishing to have access can email Laura Symul).
Loading the RDS file containing the unblinded VIBRANT MAE: /04 unblinded MAEs/mae_full_20250624.rds
Code
mae <-readRDS(mae_file)rm(mae_file)
Demographics (Table 1 & suppl. tables by sites)
Summary table
The demographics table is built as follow.
We first create a table1_datadata.frame from the participant_crfs_merged table stored in the @metadata slot of the mae.
This table contains almost all variables needed to create the demographics table (i.e., site, age, race, ethn, cut_size_meals, eat_less, hungry_did_not_eat, sexual_partners_lifetime, sexual_partners_past_month, contraception_method).
In addition, the mae@colData contains the participants’ arm and study population flags (here we keep the mITT population).
From the variables listed above, we create the variable food_insecurity using the three variables cut_size_meals, eat_less and hungry_did_not_eat: if a participant answered “Yes” to any of the corresponding questions, then she is considered to be affected by food insecurity.
table1_data |>ggplot(aes(x = food_insecurity)) +geom_bar() +labs(x="Food insecurity", y ="Count")+theme_classic()
In addition to these variables, we also include the participants’ partner’s gender.
Participants are asked about their partner’s gender at all weekly visits (1000, 1100, 1200, 1300, 1400, 1500, 1700, 1900, 2120) using CRF 19.
Answers to that question can be found in the past_week_sex_partner column of the mae@metadata$visits_crfs_merged table. Participants’ answer throughout the weekly visits will be summarized into one of these 4 categories:
No sex
Only male
Only female
Both
This variable is added to the table1_data table.
Code
gender_sexual_partner <-metadata(mae)$visits_crfs_merged |>select(pid, past_week_sex_partner) |>filter(!is.na(past_week_sex_partner)) |>group_by(pid) |>summarise(gender_sexual_partner =case_when(all(past_week_sex_partner =="No sex in the past week") ~"No sex",any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) &!any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) ~"Only man",any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) &!any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) ~"Only female",any(past_week_sex_partner %in%c("A single male partner", "Multiple male partners")) &any(past_week_sex_partner %in%c("A single female partner", "Multiple female partners")) ~"Both",TRUE~NA_character_ ),.groups ="drop" )table1_data <- table1_data |>left_join(gender_sexual_partner, by ="pid")
We also re-code the race variable as follows:
“Asian or South Asian” is re-coded to “Asian” (for brevity)
“Black, African American, or African” and “African” are re-coded to “Black/African” (for brevity)
“Coloured”, “White”, “Other”, and “Prefered not to answer” are left as is.
Note
@Disebo & Caroline: do we go for British or American English? (Colored vs Coloured)?
Code
table1_data <- table1_data |>mutate(across(where(is.character), as.factor),race =fct_recode( race,"Asian"="Asian or South Asian","Black/African"="Black, African American, or African","Black/African"="African","Coloured"="Coloured","White"="White","Other"="Other","Prefer not to answer"="Prefer not to answer" ),race =fct_relevel(race, "Asian", "Black/African", "Coloured", "White", "Other", "Prefer not to answer"),race = race |>fct_drop(),arm = arm |>fct_drop() ) |>set_variable_labels(food_insecurity ="Report food insecurity in past 12 months ",sexual_partners_past_month ="Number of partners in past month",sexual_partners_lifetime ="Number of partners in lifetime",ethn ="Ethnicity", contraception_method ="Contraception", gender_sexual_partner ="Gender of sexual partners" )
Warning
@Laura Vermeren: I changed the summary statistics from mean to median for the continuous variables (for number of past sexual partner, the mean was artificially high in the arm with the 90 partner MGH participant). Would you be able to check that the tests selected by the gt_summary package are appropriate for the data we have? They should be, but I’d love if we could double-check :)
2 Kruskal-Wallis rank sum test; Fisher’s exact test
Study population characteristics appear balanced across arms at the CAPRISA site. Given the low number of participants at the MGH site in two of the five arms, we create an additional table excluding these two arms.
2 Kruskal-Wallis rank sum test; Fisher’s exact test
Besides the number of lifetime sexual partners, arms appear balanced across arm at MGH.
Primary outcomes (Table 2)
LBP colonization by week 5
The primary outcome is defined as “Colonization by any of the LBP strains after product administration by week 5”, and colonization by any of the LBP strain is defined as a relative abundance of 5% by any single strain or 10% total relative abundance of the LBP strains. The relative abundance of LBP strain is estimated using metagenomics (taxonomic composition estimated using VIRGO2 and strain proportion of total L. crispatus is estimated using kSanity), and includes samples from weekly visits 1200 to 1500 (week 2 (post-INT) to week 5 (3 weeks post-INT)).
test_results <- fisher_test_results |>mutate(adj_p_value = adj_p_value |> scales::pvalue(accuracy =0.001),OR = OR |>round(2) |>as.character(),CI =str_c(lower_CI |>round(2), " - ", upper_CI |>round(2)) ) |>select(arm, adj_p_value, OR, CI) |>pivot_longer(cols =-arm, names_to ="name", values_to ="value") |>pivot_wider(names_from = arm, values_from = value) |>mutate(name =case_when( name =="adj_p_value"~"Adj. p-value", name =="OR"~"OR", name =="CI"~"95% CI",TRUE~ name ) )
Code
table_2_with_model_results <-bind_rows( table_2, test_results |>filter(name !="Adj. p-value") ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="OR"~"Ref.",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.", row_group_as_column =TRUE ) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC-106-7`="LC-106<br>7 days",`LC-106-3`="LC-106<br>3 days",`LC-106-o`="LC-106<br>overlap",`LC-115`="LC-115<br>7 days",.fn = md )
LBP Colonization by week 5 by arm, with colonization definition based on qPCR-estimated relative abundance.
Placebo
LC-106 7 days
LC-106 3 days
LC-106 overlap
LC-115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
2 (14 %)
11 (79 %)
8 (57 %)
8 (57 %)
10 (71 %)
95% CI
49% - 95%
29% - 82%
29% - 82%
42% - 92%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
1 (20 %)
6 (86 %)
1 (100 %)
1 (100 %)
6 (100 %)
95% CI
42% - 100%
3% - 100%
3% - 100%
54% - 100%
OR
Ref.
20.25
7.44
7.44
19.07
95% CI
4.44 - Inf
1.62 - Inf
1.62 - Inf
4.15 - Inf
LBP detection by week 5, qPCR-based detection
In this analysis, LBP is considered to be detected if at least 2 LBP strains had observed Cq values at the same visit by week 5. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.
detected_by_week5_qPCR |>arrange(site, arm, LBP_detected_by_week5) |>group_by(site, arm) |>mutate(participant_number =row_number() |>factor()) |>ungroup() |>ggplot(aes(x = participant_number, y = site |>fct_rev(), col = site, fill = site, shape = LBP_detected_by_week5)) +geom_point(size =3) +facet_grid(. ~ arm, scales ="free_x", space ="free_x") +guides(color ="none", fill ="none") +scale_shape_manual("Colonized by LBP by week 5", values =c(1, 16)) +scale_color_manual(values = site_colors) +xlab("Participant number") +ylab("") +ggtitle("Colonization based on qPCR-based detection of at least 2 LBP strains detected at the same visit by week 5") +theme(legend.position ="top",axis.text.x =element_text(angle =90, hjust =1, vjust =0.5), plot.title =element_text(hjust =0.5) )
test_results <- fisher_test_results |>mutate(adj_p_value = adj_p_value |> scales::pvalue(accuracy =0.001),OR = OR |>round(2) |>as.character(),CI =str_c(lower_CI |>round(2), " - ", upper_CI |>round(2)) ) |>select(arm, adj_p_value, OR, CI) |>pivot_longer(cols =-arm, names_to ="name", values_to ="value") |>pivot_wider(names_from = arm, values_from = value) |>mutate(name =case_when( name =="adj_p_value"~"Adj. p-value", name =="OR"~"OR", name =="CI"~"95% CI",TRUE~ name ) )
Code
table_2_with_model_results <-bind_rows( table_2, test_results |>filter(name !="Adj. p-value") ) |>mutate(site = site |>str_replace_na(""),Pl =case_when( name =="Adj. p-value"~"Ref.", name =="OR"~"Ref.",str_detect(name, "CI") ~"",TRUE~ Pl ) )
Code
table_2_with_model_results |>group_by(site) |>gt(caption ="LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.", row_group_as_column =TRUE ) |>cols_width("name"~px(150),everything() ~px(120) ) |>cols_label(name ="",Pl ="Placebo",`LC-106-7`="LC-106<br>7 days",`LC-106-3`="LC-106<br>3 days",`LC-106-o`="LC-106<br>overlap",`LC-115`="LC-115<br>7 days",.fn = md )
LBP detection by week 5 by arm, with LBP detection defined as at least two LBP strains simultaneously detected by qPCR.
Placebo
LC-106 7 days
LC-106 3 days
LC-106 overlap
LC-115 7 days
CAP
N participants
N = 14
N = 14
N = 14
N = 14
N = 14
n (%) participants with LBP strain detected
2 (14 %)
12 (86 %)
8 (57 %)
9 (64 %)
11 (79 %)
95% CI
57% - 98%
29% - 82%
35% - 87%
49% - 95%
MGH
N participants
N = 5
N = 7
N = 1
N = 1
N = 6
n (%) participants with LBP strain detected
3 (60 %)
6 (86 %)
1 (100 %)
1 (100 %)
6 (100 %)
95% CI
42% - 100%
3% - 100%
3% - 100%
54% - 100%
OR
Ref.
15.28
4.01
5.29
14.44
95% CI
3.49 - Inf
0.99 - Inf
1.29 - Inf
3.28 - Inf
L. crispatus dominance by week 5
In this analysis, L. crispatus dominance is defined as a metagenomic-estimated relative abundance ≥ 50%. “By week 5” includes all post-product visits from visit_code = 1200 (no excluding the immediate post-product visits.
g_tot_prop_LBP_strains +geom_label(aes(label = pid_nb), size =2) +labs(caption ="Participants are labeled by their enrollment order within each arm and site." )
Proportions of participants who colonized in each arm and each visit
g_tot_prop_Lc +geom_label(aes(label = pid_nb), size =2) +labs(caption ="Participants are labeled by their enrollment order within each arm and site." )
Proportions of participants who colonized with 50% L. crispatus in each arm and each visit
Microbiota trajectories of mITT participants in the VIBRANT study. The relative abundances of the LBP strains and top taxa are shown. Participants are ordered by their total relative abundance of LBP strains, and the study weeks are shown in the rows. The alpha level (transparency) indicates whether the participant is part of the per-protocol (PP) or modified intention-to-treat (mITT) population.
Code
# checking the data for the placebo participant that has the LBP strains at MGHmae[["mg"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) mae[["qPCR"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) mae[["amplicon"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", study_week ==5) |>select(.feature, .sample, rel_ab, exclude_sample)mae[["amplicon"]] |>as_tibble() |> dplyr::left_join(mae@colData |>as_tibble()) |>filter(pid =="068100062", .feature =="Lactobacillus_crispatus") |>select(.feature, .sample, visit_code, rel_ab, exclude_sample)
tmp <-# we use the metagenomics data for this mae[["mg"]] |>as_tibble() |>filter(!is.na(LBP), !poor_quality_mg_data) |>select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |># add the arm, mITT, etc.left_join( mae |>colData() |>as_tibble() |>select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), by =join_by(uid) ) |># only consider the mITTfilter(randomized, mITT) |>filter(PIPV, as.numeric(visit_code) >=1200) |># only include visits after the first 2 weeks# define the outcome of interest for each person and straingroup_by(.feature, LBP, strain_origin, pid, arm, site) |>summarize(ever_colonized =any(rel_ab >=0.05, na.rm =TRUE),.groups ="drop" ) |># exclude Placebo participants because, they were, in theory, not exposed to any of the strains.filter(arm !="Pl") |># define the total number of participants exposed for each strain.# For LC-115 strains, this is only participants of arm LC-115# For LC-106 strains, this is participants from all active arms.mutate(exposed =case_when( LBP =="LC-115"~ (arm =="LC-115"), LBP =="LC-106 & LC-115"~TRUE,TRUE~FALSE ) ) |># we compute the statistics of interestgroup_by(.feature, LBP, strain_origin, site) |>summarize(n_exposed =sum(exposed, na.rm =TRUE),n_colonized =sum(ever_colonized & exposed, na.rm =TRUE),.groups ="drop" ) |>mutate(prop_colonized = n_colonized / n_exposed,CI_low = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$lower,CI_high = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$upper )
Code
tmp |>ggplot() +aes(x = .feature, y = prop_colonized, col = site) +facet_grid(. ~ LBP + strain_origin, scales ="free", space ="free") +geom_errorbar(aes(ymin = CI_low, ymax = CI_high), linewidth =1, alpha =0.7,width =0.3, position =position_dodge(width =0.5) ) +geom_point(position =position_dodge(width =0.5), size =2) +scale_color_manual("Study site", values = site_colors) +xlab("LBP strain") +ylab("Proportion of participants colonized\namong those exposed") +theme(axis.text.x =element_text(angle =45, hjust =1) )
Code
tmp |>mutate(CI =str_c("[", scales::percent(CI_low, accuracy =1), " - ", scales::percent(CI_high, accuracy =1), "]" ),value =str_c( scales::percent(prop_colonized, accuracy =1), " (", n_colonized, "/", n_exposed,")<br>", CI ),strain_info =str_c(LBP, "<br>(", strain_origin, " strains)") ) |>select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |>pivot_wider(names_from =c(site), values_from = value ) |>arrange(strain_info) |>group_by(strain_info) |>gt(row_group_as_column =TRUE, process_md =TRUE,caption ="Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure." ) |>cols_label(.feature ="LBP strain") |>cols_align(columns = .feature, align ="left") |>cols_align(columns =c("CAP", "MGH"), align ="center") |>fmt_markdown(columns =everything())
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure.
LBP strain
CAP
MGH
LC-106 & LC-115 (SA strains)
FF00018
7% (4/56) [3% - 17%]
20% (3/15) [7% - 45%]
FF00051
30% (17/56) [20% - 43%]
67% (10/15) [42% - 85%]
UC101
25% (14/56) [16% - 38%]
73% (11/15) [48% - 89%]
LC-106 & LC-115 (US strains)
C0022A1
25% (14/56) [16% - 38%]
47% (7/15) [25% - 70%]
C0059E1
61% (34/56) [48% - 72%]
87% (13/15) [62% - 96%]
C0175A1
39% (22/56) [28% - 52%]
67% (10/15) [42% - 85%]
LC-115 (SA strains)
122010
57% (8/14) [33% - 79%]
33% (2/6) [10% - 70%]
185329
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00004
50% (7/14) [27% - 73%]
83% (5/6) [44% - 97%]
FF00064
50% (7/14) [27% - 73%]
83% (5/6) [44% - 97%]
FF00072
7% (1/14) [1% - 31%]
50% (3/6) [19% - 81%]
UC119
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
LC-115 (US strains)
C0006A1
50% (7/14) [27% - 73%]
67% (4/6) [30% - 90%]
C0028A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0112A1
64% (9/14) [39% - 84%]
83% (5/6) [44% - 97%]
Excluding immediate post-product visit
Code
tmp <-# we use the metagenomics data for this mae[["mg"]] |>as_tibble() |>filter(!is.na(LBP), !poor_quality_mg_data) |>select(.feature, LBP, strain_origin, .sample, uid, rel_ab) |># add the arm, mITT, etc.left_join( mae |>colData() |>as_tibble() |>select(uid, pid, site, randomized, mITT, arm, visit_code, visit, PIPV), by =join_by(uid) ) |># only consider the mITTfilter(randomized, mITT) |>mutate(include_visit =case_when( (arm %in%c("LC-106-7", "LC-115")) &as.numeric(visit_code) >1200~TRUE, (arm %in%c("LC-106-3", "LC-106-o")) &as.numeric(visit_code) >=1200~TRUE,TRUE~FALSE ) ) |>filter(PIPV, include_visit) |># only include visits after the first 2 weeks# define the outcome of interest for each person and straingroup_by(.feature, LBP, strain_origin, pid, arm, site) |>summarize(ever_colonized =any(rel_ab >=0.05, na.rm =TRUE),.groups ="drop" ) |># exclude Placebo participants because, they were, in theory, not exposed to any of the strains.filter(arm !="Pl") |># define the total number of participants exposed for each strain.# For LC-115 strains, this is only participants of arm LC-115# For LC-106 strains, this is participants from all active arms.mutate(exposed =case_when( LBP =="LC-115"~ (arm =="LC-115"), LBP =="LC-106 & LC-115"~TRUE,TRUE~FALSE ) ) |># we compute the statistics of interestgroup_by(.feature, LBP, strain_origin, site) |>summarize(n_exposed =sum(exposed, na.rm =TRUE),n_colonized =sum(ever_colonized & exposed, na.rm =TRUE),.groups ="drop" ) |>mutate(prop_colonized = n_colonized / n_exposed,CI_low = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$lower,CI_high = binom::binom.confint(n_colonized, n_exposed, method ="wilson")$upper )
Code
tmp |>ggplot() +aes(x = .feature, y = prop_colonized, col = site) +facet_grid(. ~ LBP + strain_origin, scales ="free", space ="free") +geom_errorbar(aes(ymin = CI_low, ymax = CI_high), linewidth =1, alpha =0.7,width =0.3, position =position_dodge(width =0.5) ) +geom_point(position =position_dodge(width =0.5), size =2) +scale_color_manual("Study site", values = site_colors) +xlab("LBP strain") +ylab("Proportion of participants colonized\namong those exposed") +theme(axis.text.x =element_text(angle =45, hjust =1) )
Code
tmp |>mutate(CI =str_c("[", scales::percent(CI_low, accuracy =1), " - ", scales::percent(CI_high, accuracy =1), "]" ),value =str_c( scales::percent(prop_colonized, accuracy =1), " (", n_colonized, "/", n_exposed,")<br>", CI ),strain_info =str_c(LBP, "<br>(", strain_origin, " strains)") ) |>select(-n_exposed, -n_colonized, -prop_colonized, -CI_low, -CI_high, -CI, -LBP, -strain_origin) |>pivot_wider(names_from =c(site), values_from = value ) |>arrange(strain_info) |>group_by(strain_info) |>gt(row_group_as_column =TRUE, process_md =TRUE,caption ="Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms)." ) |>cols_label(.feature ="LBP strain") |>cols_align(columns = .feature, align ="left") |>cols_align(columns =c("CAP", "MGH"), align ="center") |>fmt_markdown(columns =everything())
Proportion of mITT participants exposed to a given strain who colonized with that strain (colonization defined as a ≥ 5% relative abundance by metagenomics) at any weekly visit after product exposure but excluding immediate post-product visit (i.e., excluding week 2 visit for the LC-106-7 and LC-115 arms).
LBP strain
CAP
MGH
LC-106 & LC-115 (SA strains)
FF00018
2% (1/56) [0% - 9%]
7% (1/15) [1% - 30%]
FF00051
9% (5/56) [4% - 19%]
20% (3/15) [7% - 45%]
UC101
2% (1/56) [0% - 9%]
7% (1/15) [1% - 30%]
LC-106 & LC-115 (US strains)
C0022A1
21% (12/56) [13% - 34%]
33% (5/15) [15% - 58%]
C0059E1
43% (24/56) [31% - 56%]
40% (6/15) [20% - 64%]
C0175A1
39% (22/56) [28% - 52%]
67% (10/15) [42% - 85%]
LC-115 (SA strains)
122010
36% (5/14) [16% - 61%]
17% (1/6) [3% - 56%]
185329
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00004
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00064
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
FF00072
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
UC119
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
LC-115 (US strains)
C0006A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0028A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
C0112A1
0% (0/14) [0% - 22%]
0% (0/6) [0% - 39%]
Supplementary analyses
TODO
Applicator staining
Numbers
Proportion of participants that returned applicators
Proportion of participants with returned applicators for which the staining analysis was done
Staining analysis
mITT
Concordance between self-reports and applicator staining:
scatter plot of # of self-reported doses vs # of positive staining
Number of participants that self-reported using at least one dose (= XX)
Number of participants that had at least one positive staining (= YY)
contingency table of self-reported doses and positive staining
Proportion of participants self-reported using at least one dose and have at least one positive staining (= ZZ/XX)
PP
Same but with at least 6 doses
Time since last dose in 7-days active arms (LC-106-7, LC-115)
We compute this time based on self-reported time of last study product use (daily - CRF plate 32/33) -> date/time of last study product = use the crf 32/33 visit_day + answer to when? (early morning, etc.) -> swab time = use study_day of the weekly visit.
LBP colonization in participants with BV post-MTZ
see “After oral metronidazole treatment, 22/70 (30%) people still had a Nugent score of 7 or higher, indicating a failure to resolve BV. Among the 16 of these people exposed to LBP, ZZ had a detectable LBP strain identified after treatment by metagenomics, and TT by qPCR during treatment.”